Overview

Dataset statistics

Number of variables13
Number of observations81
Missing cells170
Missing cells (%)16.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.4 KiB
Average record size in memory105.6 B

Variable types

NUM10
CAT3

Warnings

Urban area[12] Population is highly correlated with Metropolitan area[d] PopulationHigh correlation
Metropolitan area[d] Population is highly correlated with Urban area[12] PopulationHigh correlation
City proper[c] Definition is highly correlated with City[a]High correlation
City[a] is highly correlated with City proper[c] DefinitionHigh correlation
City proper[c] Definition has 6 (7.4%) missing values Missing
City proper[c] Population has 7 (8.6%) missing values Missing
City proper[c] Area (km2) has 7 (8.6%) missing values Missing
City proper[c] Density (/km2) has 7 (8.6%) missing values Missing
Metropolitan area[d] Population has 40 (49.4%) missing values Missing
Metropolitan area[d] Area (km2) has 50 (61.7%) missing values Missing
Metropolitan area[d] Density (/km2) has 50 (61.7%) missing values Missing
Urban area[12] Population has 1 (1.2%) missing values Missing
Urban area[12] Area (km2) has 1 (1.2%) missing values Missing
Urban area[12] Density (/km2) has 1 (1.2%) missing values Missing
City[a] has unique values Unique

Reproduction

Analysis started2021-08-17 18:18:03.860784
Analysis finished2021-08-17 18:18:20.944068
Duration17.08 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

City[a]
Categorical

HIGH CORRELATION
UNIQUE

Distinct81
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size648.0 B
London
 
1
Mumbai
 
1
Bangkok
 
1
Rio de Janeiro
 
1
Dongguan
 
1
Other values (76)
76 
ValueCountFrequency (%) 
London11.2%
 
Mumbai11.2%
 
Bangkok11.2%
 
Rio de Janeiro11.2%
 
Dongguan11.2%
 
Hyderabad11.2%
 
Lima11.2%
 
Beijing11.2%
 
Singapore11.2%
 
Shenyang11.2%
 
Other values (71)7187.7%
 
2021-08-17T23:48:21.050276image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique81 ?
Unique (%)100.0%
2021-08-17T23:48:21.187769image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length16
Median length7
Mean length7.75308642
Min length4

Country
Categorical

Distinct36
Distinct (%)44.4%
Missing0
Missing (%)0.0%
Memory size648.0 B
China
20 
India
United States
Japan
Brazil
 
3
Other values (31)
36 
ValueCountFrequency (%) 
China2024.7%
 
India911.1%
 
United States911.1%
 
Japan44.9%
 
Brazil33.7%
 
Pakistan22.5%
 
Russia22.5%
 
Spain22.5%
 
Mexico22.5%
 
Egypt22.5%
 
Other values (26)2632.1%
 
2021-08-17T23:48:21.323294image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique26 ?
Unique (%)32.1%
2021-08-17T23:48:21.456169image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length5
Mean length6.962962963
Min length4

UN 2018 population estimates[b]
Real number (ℝ≥0)

Distinct79
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10736739.11
Minimum5023000
Maximum37400068
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:21.576611image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum5023000
5-th percentile5207000
Q16115000
median8245000
Q313215000
95-th percentile21650000
Maximum37400068
Range32377068
Interquartile range (IQR)7100000

Descriptive statistics

Standard deviation6276119.813
Coefficient of variation (CV)0.5845461781
Kurtosis3.741088261
Mean10736739.11
Median Absolute Deviation (MAD)2673000
Skewness1.784842227
Sum869675868
Variance3.938967991e+13
MonotocityDecreasing
2021-08-17T23:48:21.711952image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
723600022.5%
 
611500022.5%
 
597200011.2%
 
2165000011.2%
 
515700011.2%
 
569500011.2%
 
1317100011.2%
 
1190800011.2%
 
886400011.2%
 
1241000011.2%
 
Other values (69)6985.2%
 
ValueCountFrequency (%) 
502300011.2%
 
505200011.2%
 
508600011.2%
 
515700011.2%
 
520700011.2%
 
ValueCountFrequency (%) 
3740006811.2%
 
2851400011.2%
 
2567480011.2%
 
2558200011.2%
 
2165000011.2%
 

City proper[c] Definition
Categorical

HIGH CORRELATION
MISSING

Distinct25
Distinct (%)33.3%
Missing6
Missing (%)7.4%
Memory size648.0 B
Municipality
20 
City (sub - provincial)
14 
City
Capital city
Designated city
Other values (20)
26 
ValueCountFrequency (%) 
Municipality2024.7%
 
City (sub - provincial)1417.3%
 
City89.9%
 
Capital city44.9%
 
Designated city33.7%
 
Urban governorate33.7%
 
Metropolitan municipality33.7%
 
Federal city22.5%
 
Metropolitan city22.5%
 
Country11.2%
 
Other values (15)1518.5%
 
(Missing)67.4%
 
2021-08-17T23:48:21.862977image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique16 ?
Unique (%)21.3%
2021-08-17T23:48:21.992374image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length29
Median length12
Mean length15.02469136
Min length3

City proper[c] Population
Real number (ℝ≥0)

MISSING

Distinct74
Distinct (%)100.0%
Missing7
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean7783677.095
Minimum236453
Maximum32054159
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:22.115750image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum236453
5-th percentile680455.05
Q12726647.25
median7697000
Q310515511.75
95-th percentile16292687.25
Maximum32054159
Range31817706
Interquartile range (IQR)7788864.5

Descriptive statistics

Standard deviation5873206.433
Coefficient of variation (CV)0.7545542244
Kurtosis3.493254746
Mean7783677.095
Median Absolute Deviation (MAD)4097937
Skewness1.408052735
Sum575992105
Variance3.449455381e+13
MonotocityNot monotonic
2021-08-17T23:48:22.369108image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
216586711.2%
 
1351527111.2%
 
272500611.2%
 
812675511.2%
 
1675323511.2%
 
305430011.2%
 
1320000011.2%
 
1604470011.2%
 
1252830011.2%
 
672700011.2%
 
Other values (64)6479.0%
 
(Missing)78.6%
 
ValueCountFrequency (%) 
23645311.2%
 
42000311.2%
 
47091411.2%
 
63959811.2%
 
70245511.2%
 
ValueCountFrequency (%) 
3205415911.2%
 
2487089511.2%
 
2189309511.2%
 
1675323511.2%
 
1604470011.2%
 

City proper[c] Area (km2)
Real number (ℝ≥0)

MISSING

Distinct74
Distinct (%)100.0%
Missing7
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean4935.378378
Minimum22
Maximum82403
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:22.514371image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile103.6
Q1358
median1307
Q33768.5
95-th percentile16475.75
Maximum82403
Range82381
Interquartile range (IQR)3410.5

Descriptive statistics

Standard deviation11766.34136
Coefficient of variation (CV)2.384080906
Kurtosis29.10374431
Mean4935.378378
Median Absolute Deviation (MAD)1006
Skewness5.038478978
Sum365218
Variance138446788.9
MonotocityNot monotonic
2021-08-17T23:48:22.647037image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1013511.2%
 
46411.2%
 
65011.2%
 
157211.2%
 
75111.2%
 
58911.2%
 
8240311.2%
 
11611.2%
 
24311.2%
 
267211.2%
 
Other values (64)6479.0%
 
(Missing)78.6%
 
ValueCountFrequency (%) 
2211.2%
 
4311.2%
 
9311.2%
 
10111.2%
 
10511.2%
 
ValueCountFrequency (%) 
8240311.2%
 
5306811.2%
 
2214211.2%
 
1659611.2%
 
1641111.2%
 

City proper[c] Density (/km2)
Real number (ℝ≥0)

MISSING

Distinct74
Distinct (%)100.0%
Missing7
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean7261.108108
Minimum29
Maximum41399
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:22.784686image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum29
5-th percentile614.85
Q11890
median5163
Q310756.25
95-th percentile20541.9
Maximum41399
Range41370
Interquartile range (IQR)8866.25

Descriptive statistics

Standard deviation7196.955453
Coefficient of variation (CV)0.9911648947
Kurtosis6.187297639
Mean7261.108108
Median Absolute Deviation (MAD)3747.5
Skewness2.057300883
Sum537322
Variance51796167.8
MonotocityNot monotonic
2021-08-17T23:48:22.917058image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1075911.2%
 
2634911.2%
 
152511.2%
 
433611.2%
 
118611.2%
 
2911.2%
 
84911.2%
 
2069411.2%
 
1867111.2%
 
620211.2%
 
Other values (64)6479.0%
 
(Missing)78.6%
 
ValueCountFrequency (%) 
2911.2%
 
20011.2%
 
38911.2%
 
57011.2%
 
63911.2%
 
ValueCountFrequency (%) 
4139911.2%
 
2634911.2%
 
2193511.2%
 
2069411.2%
 
2046011.2%
 

Metropolitan area[d] Population
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct41
Distinct (%)100.0%
Missing40
Missing (%)49.4%
Infinite0
Infinite (%)0.0%
Mean13498899.56
Minimum5156217
Maximum37274000
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:23.046547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum5156217
5-th percentile5286642
Q16641649
median12545272
Q319303000
95-th percentile29000000
Maximum37274000
Range32117783
Interquartile range (IQR)12661351

Descriptive statistics

Standard deviation8183468.659
Coefficient of variation (CV)0.60623228
Kurtosis0.8646083074
Mean13498899.56
Median Absolute Deviation (MAD)6245272
Skewness1.149962656
Sum553454882
Variance6.696915929e+13
MonotocityNot monotonic
2021-08-17T23:48:23.173397image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%) 
1264432111.2%
 
547448211.2%
 
699738411.2%
 
2180451511.2%
 
1280686611.2%
 
630000011.2%
 
2173468211.2%
 
720000011.2%
 
609612011.2%
 
594995111.2%
 
Other values (31)3138.3%
 
(Missing)4049.4%
 
ValueCountFrequency (%) 
515621711.2%
 
527432111.2%
 
528664211.2%
 
547448211.2%
 
592804011.2%
 
ValueCountFrequency (%) 
3727400011.2%
 
3343028511.2%
 
2900000011.2%
 
2551400011.2%
 
2440000011.2%
 

Metropolitan area[d] Area (km2)
Real number (ℝ≥0)

MISSING

Distinct31
Distinct (%)100.0%
Missing50
Missing (%)61.7%
Infinite0
Infinite (%)0.0%
Mean9579.709677
Minimum620
Maximum22463
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:23.305242image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum620
5-th percentile1511
Q14067.5
median7762
Q314427.5
95-th percentile21542.5
Maximum22463
Range21843
Interquartile range (IQR)10360

Descriptive statistics

Standard deviation6512.7113
Coefficient of variation (CV)0.6798443293
Kurtosis-0.8408447092
Mean9579.709677
Median Absolute Deviation (MAD)4797
Skewness0.5625595673
Sum296971
Variance42415408.48
MonotocityNot monotonic
2021-08-17T23:48:23.422238image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
532711.2%
 
281911.2%
 
378011.2%
 
62011.2%
 
117111.2%
 
725611.2%
 
356011.2%
 
1589011.2%
 
1540311.2%
 
1700911.2%
 
Other values (21)2125.9%
 
(Missing)5061.7%
 
ValueCountFrequency (%) 
62011.2%
 
117111.2%
 
185111.2%
 
279311.2%
 
281911.2%
 
ValueCountFrequency (%) 
2246311.2%
 
2169011.2%
 
2139511.2%
 
1864011.2%
 
1731511.2%
 

Metropolitan area[d] Density (/km2)
Real number (ℝ≥0)

MISSING

Distinct31
Distinct (%)100.0%
Missing50
Missing (%)61.7%
Infinite0
Infinite (%)0.0%
Mean3349.225806
Minimum274
Maximum20770
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:23.544088image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum274
5-th percentile330
Q1774
median2094
Q33083.5
95-th percentile13129.5
Maximum20770
Range20496
Interquartile range (IQR)2309.5

Descriptive statistics

Standard deviation4746.479437
Coefficient of variation (CV)1.417187049
Kurtosis7.833121455
Mean3349.225806
Median Absolute Deviation (MAD)1301
Skewness2.771906778
Sum103826
Variance22529067.05
MonotocityNot monotonic
2021-08-17T23:48:23.662360image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
758311.2%
 
560311.2%
 
218011.2%
 
277211.2%
 
237411.2%
 
105811.2%
 
209411.2%
 
128811.2%
 
51011.2%
 
46211.2%
 
Other values (21)2125.9%
 
(Missing)5061.7%
 
ValueCountFrequency (%) 
27411.2%
 
32711.2%
 
33311.2%
 
36811.2%
 
38811.2%
 
ValueCountFrequency (%) 
2077011.2%
 
1793311.2%
 
832611.2%
 
758311.2%
 
560311.2%
 

Urban area[12] Population
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct80
Distinct (%)100.0%
Missing1
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean11852275
Minimum2280000
Maximum39105000
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:23.791808image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum2280000
5-th percentile4727400
Q16521000
median9143500
Q315479500
95-th percentile22568800
Maximum39105000
Range36825000
Interquartile range (IQR)8958500

Descriptive statistics

Standard deviation7303674.44
Coefficient of variation (CV)0.6162255297
Kurtosis2.529618677
Mean11852275
Median Absolute Deviation (MAD)3791000
Skewness1.479104659
Sum948182000
Variance5.334366033e+13
MonotocityNot monotonic
2021-08-17T23:48:23.932467image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1467800011.2%
 
1156400011.2%
 
510300011.2%
 
1102700011.2%
 
2239400011.2%
 
2397100011.2%
 
1192000011.2%
 
927400011.2%
 
702600011.2%
 
2148900011.2%
 
Other values (70)7086.4%
 
ValueCountFrequency (%) 
228000011.2%
 
399400011.2%
 
438100011.2%
 
458300011.2%
 
473500011.2%
 
ValueCountFrequency (%) 
3910500011.2%
 
3536200011.2%
 
3187000011.2%
 
2397100011.2%
 
2249500011.2%
 

Urban area[12] Area (km2)
Real number (ℝ≥0)

MISSING

Distinct80
Distinct (%)100.0%
Missing1
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean2324.275
Minimum238
Maximum12093
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:24.075547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum238
5-th percentile357.65
Q11025.25
median1646
Q33064.75
95-th percentile6383.75
Maximum12093
Range11855
Interquartile range (IQR)2039.5

Descriptive statistics

Standard deviation2095.803456
Coefficient of variation (CV)0.9017020172
Kurtosis5.714701572
Mean2324.275
Median Absolute Deviation (MAD)747
Skewness2.103572389
Sum185942
Variance4392392.126
MonotocityNot monotonic
2021-08-17T23:48:24.219014image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
542911.2%
 
527811.2%
 
161411.2%
 
172211.2%
 
100511.2%
 
36011.2%
 
29011.2%
 
69411.2%
 
114711.2%
 
23811.2%
 
Other values (70)7086.4%
 
ValueCountFrequency (%) 
23811.2%
 
29011.2%
 
29311.2%
 
31311.2%
 
36011.2%
 
ValueCountFrequency (%) 
1209311.2%
 
823111.2%
 
740011.2%
 
700611.2%
 
635111.2%
 

Urban area[12] Density (/km2)
Real number (ℝ≥0)

MISSING

Distinct80
Distinct (%)100.0%
Missing1
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean8086.8875
Minimum734
Maximum36928
Zeros0
Zeros (%)0.0%
Memory size648.0 B
2021-08-17T23:48:24.470310image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum734
5-th percentile1323.75
Q14036.5
median5743
Q310012.5
95-th percentile21464.7
Maximum36928
Range36194
Interquartile range (IQR)5976

Descriptive statistics

Standard deviation6723.32797
Coefficient of variation (CV)0.8313863609
Kurtosis5.374588538
Mean8086.8875
Median Absolute Deviation (MAD)2355.5
Skewness2.098478161
Sum646951
Variance45203138.99
MonotocityNot monotonic
2021-08-17T23:48:24.608562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
883911.2%
 
1657711.2%
 
814111.2%
 
376611.2%
 
137811.2%
 
1162711.2%
 
3230911.2%
 
984411.2%
 
503311.2%
 
1009211.2%
 
Other values (70)7086.4%
 
ValueCountFrequency (%) 
73411.2%
 
104911.2%
 
128611.2%
 
131911.2%
 
132411.2%
 
ValueCountFrequency (%) 
3692811.2%
 
3230911.2%
 
2551011.2%
 
2201011.2%
 
2143611.2%
 

Interactions

2021-08-17T23:48:07.568572image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:07.715742image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:07.847280image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:07.969703image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.085565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.217101image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.608851image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.726321image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.859680image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:08.976779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.088723image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.208293image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.332041image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.447850image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.576079image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.690498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.821892image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:09.939138image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.070573image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.191086image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.305208image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.422227image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.542300image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.659650image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.777114image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:10.893002image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.008593image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.124535image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.246958image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.362885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.478740image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.595747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.725784image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.862798image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:11.992704image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.242243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.358521image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.481988image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.604762image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.727997image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.840426image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:12.951224image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.067087image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.198637image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.314494image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.430352image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.552456image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.652351image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.783835image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.899695image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:13.999930image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.115788image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.231646image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.331883image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.454248image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.554338image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.654286image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.770149image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.870385image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:14.986299image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.086535image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.202395image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.302630image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.424514image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.534372image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.781038image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.886757image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:15.991557image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.102230image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.211763image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.315049image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.436370image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.561082image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.682666image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.792679image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:16.924188image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.040047image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.157231image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.278158image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.394099image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.510028image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.627318image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.741713image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:17.863237image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.045599image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.158808image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.268483image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.379438image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.496778image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.612330image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.722473image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.833694image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:18.943660image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.052254image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.283245image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.392942image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.497837image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.601592image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.711627image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:19.818301image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2021-08-17T23:48:24.743210image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-17T23:48:24.990243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-17T23:48:25.240580image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-17T23:48:25.503706image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-17T23:48:25.759959image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-17T23:48:20.026663image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:20.293038image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:20.560064image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-08-17T23:48:20.807370image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

City[a]CountryUN 2018 population estimates[b]City proper[c] DefinitionCity proper[c] PopulationCity proper[c] Area (km2)City proper[c] Density (/km2)Metropolitan area[d] PopulationMetropolitan area[d] Area (km2)Metropolitan area[d] Density (/km2)Urban area[12] PopulationUrban area[12] Area (km2)Urban area[12] Density (/km2)
0TokyoJapan37400068Metropolis prefecture13515271.02191.06169.037274000.013452.02771.039105000.08231.04751.0
1DelhiIndia28514000Capital City16753235.01484.011289.029000000.03483.08326.031870000.02233.014272.0
2SeoulSouth Korea25674800Special city10013781.0605.016208.025514000.011704.02180.022394000.02769.08087.0
3ShanghaiChina25582000Municipality24870895.06341.03922.0NaNNaNNaN22118000.04069.05436.0
4São PauloBrazil21650000Municipality12252023.01521.08055.021734682.07947.02735.022495000.03237.06949.0
5Mexico CityMexico21581000City - state9209944.01485.06202.021804515.07866.02772.021505000.02385.09017.0
6CairoEgypt20076000Urban governorate9500000.03085.03079.0NaNNaNNaN19787000.02010.09844.0
7MumbaiIndia19980000Municipality12478447.0603.020694.024400000.04355.05603.022186000.01008.022010.0
8BeijingChina19618000Municipality21893095.016411.01334.0NaNNaNNaN19437000.04172.04659.0
9DhakaBangladesh19578000Capital city8906039.0338.026349.014543124.0NaNNaN16839000.0456.036928.0

Last rows

City[a]CountryUN 2018 population estimates[b]City proper[c] DefinitionCity proper[c] PopulationCity proper[c] Area (km2)City proper[c] Density (/km2)Metropolitan area[d] PopulationMetropolitan area[d] Area (km2)Metropolitan area[d] Density (/km2)Urban area[12] PopulationUrban area[12] Area (km2)Urban area[12] Density (/km2)
71BarcelonaSpain5494000Municipality1620343.0101.015980.05474482.0NaNNaN4735000.01072.04417.0
72JohannesburgSouth Africa5486000Metropolitan municipalityNaNNaNNaNNaNNaNNaN14167000.04040.03507.0
73Saint PetersburgRussia5383000Federal cityNaNNaNNaNNaNNaNNaN5207000.01373.03792.0
74QingdaoChina5381000City (sub - provincial)NaNNaNNaNNaNNaNNaN6232000.01655.03766.0
75DalianChina5300000City (sub - provincial)NaNNaNNaNNaNNaNNaN3994000.0987.04047.0
76Washington, D.C.United States5207000Federal district702455.0177.03969.06263245.017009.0368.07583000.05501.01378.0
77YangonMyanmar5157000CityNaNNaNNaNNaNNaNNaN6497000.0603.010774.0
78AlexandriaEgypt5086000Urban governorateNaNNaNNaNNaNNaNNaN4857000.0293.016577.0
79JinanChina5052000City (sub - provincial)8700000.010244.0849.0NaNNaNNaN4381000.0798.05490.0
80GuadalajaraMexico5023000Municipality1385621.0151.09176.05286642.03560.01485.05437000.0313.017371.0